Forward stepwise: This is where one confounder covariate at a time is added to the model in
iterative models. If it does not meet rules to be kept in the model, it is removed and never
considered again in the model. Imagine you were fitting a regression model with one exposure
covariate and eight candidate confounders. Suppose that you add the first covariate with the
exposure and it meets modeling rules, so you keep it. But when you add the second covariate, it
does not meet the rules, so you leave it out. You keep doing this until you run out of variables.
Although forward stepwise can work if you have very few variables, most analysts do not use this
approach because it has been shown to be sensitive to the order you choose in which to enter
variables.
Backward elimination: In this approach, the first model you run contains all your potential
covariates, including all the confounders and the exposure. Using modeling rules, each time you
run the model, you remove or eliminate the confounder contributing the least to the model. You
decide which one that is based on modeling rules you set (such as which confounder has the largest
p value). Theoretically, after you pare away the confounders that do not meet the rules, you will
have a final model. In practice, this process can run into problems if you have collinear covariates
(see Chapters 17 and 18 for a discussions of collinearity). Your first model — filled with all your
potential covariates — may error out for this reason, and not converge. Also, it is not clear
whether once you eliminate a covariate you should try it again in the model. This approach often
sounds better on paper than it works in practice.
Stepwise selection: This approach combines the best of forward stepwise and backward
elimination. Starting with the same set of candidate covariates, you choose which covariate to
introduce first into a model with the exposure. If this covariate meets modeling rules, it is kept, and
if not, it is left out. This continues along as if you are doing forward stepwise — but then, there’s a
twist. After you are done trying each covariate and you have your forward stepwise model, you go
back and try to add back the covariates you left out one by one. Each time one seems to fit back in,
you keep it and consider it part of the working model. It is during this phase that collinearity
between covariates can become very apparent. After you try back the covariates you originally left
out and are satisfied that you were able to add back the ones that fit the modeling rules, you can
declare that you have arrived at the final model.
Once you produce your final model, check the p value for the covariate or covariates representing your
exposure. If they are not statistically significant, it means that your hypothesis was incorrect, and after
controlling for confounding, your exposure was not statistically significantly associated with the
outcome. However, if the p value is statistically significant, then you would move on to interpret the
results for your exposure covariates from your regression model. After controlling for confounding,
your exposure was statistically significantly associated with your outcome. Yay!
Use a spreadsheet to keep track of each model you run and a summary of the results. Save this
in addition to your computer code for running the models. It can help you communicate with
others about why certain covariates were retained and not retained in your final model.